Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multiple grammar and spelling mistakes in README #1062

Merged
merged 1 commit into from
Dec 2, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 19 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ the audio domain. By supporting PyTorch, torchaudio follows the same philosophy
of providing strong GPU acceleration, having a focus on trainable features through
the autograd system, and having consistent style (tensor names and dimension names).
Therefore, it is primarily a machine learning library and not a general signal
processing library. The benefits of PyTorch is be seen in torchaudio through
processing library. The benefits of PyTorch can be seen in torchaudio through
having all the computations be through PyTorch operations which makes it easy
to use and feel like a natural extension.

Expand All @@ -32,7 +32,7 @@ Dependencies
* libsox v14.3.2 or above (only required when building from source)
* [optional] vesis84/kaldi-io-for-python commit cb46cb1f44318a5d04d4941cf39084c5b021241e or above

The following is the corresponding ``torchaudio`` versions and supported Python versions.
The following are the corresponding ``torchaudio`` versions and supported Python versions.

| ``torch`` | ``torchaudio`` | ``python`` |
| ------------------------ | ------------------------ | ------------------------------- |
Expand All @@ -46,7 +46,7 @@ The following is the corresponding ``torchaudio`` versions and supported Python
Installation
------------

### Binary Distibutions
### Binary Distributions

To install the latest version using anaconda, run:

Expand Down Expand Up @@ -127,7 +127,7 @@ BUILD_SOX=1 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py i
```

This is known to work on linux and unix distributions such as Ubuntu and CentOS 7 and macOS.
If you try this on a new system and found a solution to make it work, feel free to share it by opening and issue.
If you try this on a new system and find a solution to make it work, feel free to share it by opening an issue.

#### Troubleshooting

Expand Down Expand Up @@ -195,16 +195,16 @@ Conventions

With torchaudio being a machine learning library and built on top of PyTorch,
torchaudio is standardized around the following naming conventions. Tensors are
assumed to have channel as the first dimension and time as the last
Copy link
Contributor

@vincentqb vincentqb Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the choice of singular for channel here and below was intentional :)

channel was used as "the name of the channel dimension" in a way similar to "the mel dimension", "the dimension of freq", time. Plural was avoided in order to avoid conjugating all other ones :)

It would have been a good idea to use single or double quotes instead to emphasize that this is the intended name.

As mentioned below, the number of channel would then be n_channel, in a way similar to n_freq or n_mel.

assumed to have channels as the first dimension and time as the last
dimension (when applicable). This makes it consistent with PyTorch's dimensions.
For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`, `n_mel`)")
whereas dimension names do not have this prefix (e.g. "a tensor of
dimension (channel, time)")
dimension (channels, time)")

* `waveform`: a tensor of audio samples with dimensions (channel, time)
* `waveform`: a tensor of audio samples with dimensions (channels, time)
* `sample_rate`: the rate of audio dimensions (samples per second)
* `specgram`: a tensor of spectrogram with dimensions (channel, freq, time)
* `mel_specgram`: a mel spectrogram with dimensions (channel, mel, time)
* `specgram`: a tensor of spectrogram with dimensions (channels, freq, time)
* `mel_specgram`: a mel spectrogram with dimensions (channels, mel, time)
* `hop_length`: the number of samples between the starts of consecutive frames
* `n_fft`: the number of Fourier bins
* `n_mel`, `n_mfcc`: the number of mel and MFCC bins
Expand All @@ -216,16 +216,16 @@ dimension (channel, time)")

Transforms expect and return the following dimensions.

* `Spectrogram`: (channel, time) -> (channel, freq, time)
* `AmplitudeToDB`: (channel, freq, time) -> (channel, freq, time)
* `MelScale`: (channel, freq, time) -> (channel, mel, time)
* `MelSpectrogram`: (channel, time) -> (channel, mel, time)
* `MFCC`: (channel, time) -> (channel, mfcc, time)
* `MuLawEncode`: (channel, time) -> (channel, time)
* `MuLawDecode`: (channel, time) -> (channel, time)
* `Resample`: (channel, time) -> (channel, time)
* `Fade`: (channel, time) -> (channel, time)
* `Vol`: (channel, time) -> (channel, time)
* `Spectrogram`: (channels, time) -> (channels, freq, time)
* `AmplitudeToDB`: (channels, freq, time) -> (channels, freq, time)
* `MelScale`: (channels, freq, time) -> (channels, mel, time)
* `MelSpectrogram`: (channels, time) -> (channels, mel, time)
* `MFCC`: (channels, time) -> (channel, mfcc, time)
* `MuLawEncode`: (channels, time) -> (channels, time)
* `MuLawDecode`: (channels, time) -> (channels, time)
* `Resample`: (channels, time) -> (channels, time)
* `Fade`: (channels, time) -> (channels, time)
* `Vol`: (channels, time) -> (channels, time)

Complex numbers are supported via tensors of dimension (..., 2), and torchaudio provides `complex_norm` and `angle` to convert such a tensor into its magnitude and phase. Here, and in the documentation, we use an ellipsis "..." as a placeholder for the rest of the dimensions of a tensor, e.g. optional batching and channel dimensions.

Expand Down